On Sharing Viral Video Over an Ad Hoc Wireless 

Network 



Yi-Ting Chen, Constantine Caramanis and Sanjay Shakkottai 
Department of Electrical and Computer Engineering 
The University of Texas at Austin 
Email: {yiting.chen, caramanis, shakkott}@mail.utexas.edu 



o 



00 



> 

00 
00 

o 
in 



X 



Abstract — We consider the problem of broadcasting a viral 
video (a large file) over an ad hoc wireless network (e.g., students 
in a campus). Many smartphones are GPS enabled, and equipped 
with peer-to-peer (ad hoc) transmission mode, allowing them to 
wirelessly exchange files over short distances rather than use the 
carrier's WAN. The demand for the file however is transmitted 
through the social network (e.g., a YouTlibe link posted on 
Facebook). 

To address this coupled-network problem (demand on the 
social network; bandwidth on the wireless network) where 
the two networks have different topologies, we propose a file 
dissemination algorithm. In our scheme, users query their social 
network to find geographically nearby friends that have the 
desired file, and utilize the underlying ad hoc network to route 
the data via multi-hop transmissions. We show that for many 
popular models for social networks, the file dissemination time 
scales sublinearly with n, the number of users, compared to 
the linear scaling required if each user who wants the file must 
download it from the carrier's WAN. 

I. Introduction 

The proliferation of mobile devices that can stream video 
(laptops, smartphones, tablets) has marked a dramatic increase 
in demand for streaming video. At the same time, content 
generation and dissemination has become dramatically easier - 
most phones have installed video-cameras, and knowledge of a 
video can spread extremely rapidly to vast numbers of people, 
through social networks including e-mail, Facebook, Twitter, 
and the like. As deployed capacity approaches saturation, we 
need new transmission architectures to guarantee our wireless 
networks continue to deliver traffic effectively and efficiently. 

This paper addresses precisely this problem. More specif- 
ically: we consider the simple, yet increasingly common 
setting, where a user (e.g., a student on a college campus) 
generates a large file (a short video, for example) and wants 
to spread it to her social network - her friends, their friends, 
and so on. In the current paradigm, the file creator uploads the 
file to a central server (e.g., YouTube) and then spreads word 
of its existence through Facebook, Twitter, etc. Upon learning 
of the file's existence, interested (we call them "eager") users 
then download the file from the server, using their provider's 
wide area network (WAN). Since the WAN has bounded 
bandwidth, the file dissemination time will necessarily scale 
linearly in the number of users who ultimately receive the 
file. Particularly in a dense setting like a college campus, this 
inherently limited centralized scheme for file dissemination 



may be highly suboptimal. The central question in this paper 
is: how much better can we do? 

Increasingly, smartphones and similar technology, are 
equipped with both GPS and peer-to-peer transmission modes. 
In dense environments, this opens the possibility of forming 
a wireless ad hoc network in which users communicate with 
each other through several hops of short distance transmis- 
sions. As shown in Gupta and Kumar's seminal work |T|, the 
spatial capacity of a wireless ad hoc network scales as y/n - 
a sharp contrast to the fixed capacity of a WAN. While this 
scaling spatial capacity of ad hoc networks provides a potential 
way forward, naive implementation presents severe problems 
that may leave us worse off than the currently implemented 
WAN solution. We may have severe congestion caused by 
subsets of users getting a high number of requests, hence 
resulting in hot-spots in the network. This will occur, for 
instance, if users request the file from neighbors on their social 
network, as most social networks exhibit the presence of super- 
nodes with very high degree. This is particularly true in the 
broadcast setting we have here, when we expect there to be 
such hot spots, which can potentially reduce network capacity 
by a significant factor ITtI . 

A. Main contributions 

In this paper we propose a simple and distributed file 
dissemination algorithm that takes advantage of two main 
ideas: (i) knowledge of the file spreads quickly because of 
the structure of the social network - we can use the same 
to manage file dissemination; (ii) in dense settings where ad 
hoc networks make sense, exploiting geographic proximity 
can provide additional benefits. With these ideas in mind, we 
devise a file dissemination algorithm that works by passing 
messages through the social network, and requires limited 
communication and computation overhead. In particular, the 
main features of our algorithm are as follows: 

1) Load balancing: users receiving a large amount of re- 
quests distribute them to nearby users on the social 
network, in such a way that we can guarantee no user 
has to serve more than six other users. Our algorithm 
achieves -yn-scaling with the number of users receiving 
the file - sublinear, in sharp contrast to the linear scaling 
required in the WAN file dissemination architecture. 

2) Exploiting geographic proximity: We extend our load- 
balancing algorithm to exploit geographic proximity. 
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Because of the structure of the social network, we show 
that by searching a few hops deeper in their social 
network, most users are able to download the file from 
another user at close range. This idea allows us to further 
reduce the scaling below ^Jn, depending on the depth 
of the social-network a user may search. 
3) Social Networks: We analyze our algorithm on popular 
models for social networks (power law graphs). We show 
that the file dissemination time scales sublinearly with 
n for a broad range of social-network parameters. In 
addition, we show that the performance of our algo- 
rithm is comparable to the best possible dissemination 
time of any algorithm - even those not constrained by 
communication or computation time. 

B. Related work 

Single piece file dissemination problems were first studied 
in |[l8lfT91. In |l20l-|l22l, they provide analytic results for 
multi-piece file dissemination problems. Other topics related 
to influence spreading, epidemics, and content distribution in 
social networks can be found, for example, in |23i - li25i and 
references therein. 

Multi-hop transmission in a wireless network has been 
studied extensively since Gupta and Kumar's seminal work 
|[T). Subsequently, |3| provides a simple proof and |2| closes 
the gap of 1/ ^log(n). Multicast and broadcast capacities are 
considered in, e.g., ifTTl -lTTl. On the other hand, [14l- L17i use 
randomized schemes to balance the traffic load and achieve 
throughput optimal routing. 

C. Paper organization 

We introduce the system model in Section In Section 
Unl we present our algorithm and main results. Some lemmas 
regarding random placement and random graphs are included 
in Section |IV] We analyze the performance of our algorithm in 
SectionlV] Conclusions are provided in Section[yi] The proofs 
of various lemmas and theorems in Section |III] and Section HVl 
can be found in Appendix A and Appendix B. 

II. System description 

In this section we describe the basic system model, includ- 
ing the model for the wireless network and the placement of 
the nodes, and the model for the social network. 

A. Random wireless network and Gaussian channel model 

We model our network as n static nodes, placed indepen- 
dently and uniformly on a square of width ^/r^. Thus the 
(expected) density of the network stays constant. Each node 
has a transmitter and a receiver All nodes can communicate 
with each other with fixed power P. The interference model 
is described by a Gaussian channel model defined below. 

Definition 1: (Gaussian channel model) Index nodes by 1, 
2, . . . ,n. Let Xi be the location of node i. Let A be the set of 
active transmitters at this time instant. The transmission rate 



R{xi,Xj) from node i to node j is 

R{x,, X,) ^ log ( 1 + ^i) \ (1) 

Here, t{x, y) represents the power attenuation function be- 
tween points X and y on the square, and is given by 

i(x^, Xj) = min <^ 1, — \ (2) 

where as usual, — ?/|| is the Euclidean distance between x 
and y. 

In this paper, we consider either 7 > or 7 = and a > 2. 

B. Model for social networks 

As we identify users with their devices (e.g. cell phones/ 
PDA), the n nodes in the wireless network also form a social 
network. A social network is described as a graph G — {V. E) 
where V is the set of nodes with cardinality n and E is the 
set of edges. Two nodes are joined by an edge if (and only 
if) the corresponding users are friends in the social network. 
The distance between two nodes x and y on the social-graph 
G is the minimum number of hops between x and y in the 
social network. Thus a node's neighbors are the nodes one 
hop away on the social graph, and its /c-neighborhood are the 
nodes within k hops away on the social graph. A key property 
we exploit is that distance between two nodes on the social 
network is generally unrelated to geographic distance between 
the corresponding users in the wireless network. 

Empirical studies of many social (and other) networks have 
shown them to satisfy so-called power law graph structure, 
including many collaboration networks, but also the Internet 
and many communication networks (see e.g. Ill El ID ifTOll ). 
As a consequence, power law graphs (which we define below) 
are a popular choice for modeling social networks. 

A graph G is called a power law graph with parameter /3 if 
the number of nodes with degree k is proportional to . We 
will consider social networks generated by random power law 
graphs |4|. These random graphs satisfy an important property: 
each node has only small number of neighbors, i.e., small 
degree (small relative to the size of the overall network) while 
the diameter of the random graph (the maximum number of 
hops between the vast majority of the nodes) is still small, with 
overwhelming probability. This property is consistent with 
properties of most social networks, and in particular, with the 
famous observation known as the small world phenomenon, 
first discussed in Q. 

As is common, we generate random graphs and in particular 
random power law graphs, according to expected degree 
sequences |4|. 

Definition 2: ([4]) Let w — {wi,W2, ■ ■ ■ ,Wn) be an ex- 
pected degree sequence satisfying max{w^} < J2i<k<n^k- 
We say G = (V, E) is a random graph generated by the degree 
sequence w if edge £ E is present with probability 

Definition 3: (Q) A random graph generated by Definition 
|2] is a random power law graph with parameter /?, average 
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degree d and maximum expected degree M if Wi is chosen 
by 

w,^ciio + t)-'^^^-'\ (3) 

where c = tzlJn^/iP-D and = n ( 

The well-known Erdos-Renyi graph, denoted by G{n,p), is 
the graph where each edge is present with probability p. It 
is thus a random graph with expected degree sequence w = 
{np, np, . . . , np). 

For convenience, we further introduce the following nota- 
tion. Given a subset S QV, let the volume of S be vol(5) = 
^i^gWi, i.e., the sum of weights of nodes in S. Similarly, 
define volfe(S') = Y^ies^i vol2(G)/vol(G'). 

C. Assumption on file length 

The transmission time consists of two parts: propagation 
delay and file receiving time. The propagation delay is the 
time required to receive the first bit since the start of the 
transmission. The file receiving time is the time required to 
finish the transmission since then. For simplicity, we assume 
the file length F is large, and we ignore the propagation 
delay in the analysis. We note in passing that we can formally 
incorporate both propagation delay as well as the file receiving 
time in our analysis by scaling F such that the propagation 
delay terms will be sub-dominant to the file receiving time. 

III. Algorithm and main results 

We are now ready to present our algorithm and state our 
main results. At some initial time, the file generator {the 
source) creates the file, and advertises it on her social network. 
At any given time, a node either has the file {active node), 
knows about the file and wants it because one of its social- 
network neighbors has it {eager node), or is oblivious to its 
existence {inactive node). 

The algorithm proceeds in three phases. In the Requesting 
Phase, eager nodes use their social network to request the file 
from active nodes - if knowledge of geographic location is 
available, nodes favor (geographically) nearby active nodes. 
In the Scheduling Phase, again the social network is used 
to schedule a sequence of transmissions whereby each eager 
node is assigned a transmission node from which it will obtain 
the file. In the Transmission Phase, nodes transmit the file 
to their appointed requestors, employing established routing 
techniques |2|. This final third phase is conceptually distinct 
from the first two phases, and it is important to emphasize this 
point here. The routing techniques used are independent of 
the social network structure, and follow the multi-hop ad hoc 
network protocols described in, e.g.. Ill, ID. Thus, while the 
requesting and scheduling in Phases 1 and 2 are constrained 
by the social network, the routing in Phase 3 is not. 

We present a single algorithm that accommodates two 
settings: in the first, simpler setting, nodes have no notion of 
geography, and may not request the file from active nodes 
more than a single hop away on their social network. In 
the second setting, nodes are aware of geography and hence 
distance, and "prefer" to request the file from geographically 



nearby nodes. Moreover, they are allowed to search for such 
nearby nodes beyond their immediate neighbors in the social 
network. 

Our algorithm accommodates both settings - the first, by 
adjusting the "preferred distance" to infinite and the number 
of search-hops to 1, and the second, by limiting the preferred 
distance, and by expanding the number of allowed search- 
hops. In Section IIII-BI we consider the first setting: no geo- 
graphic information available. We show that for most social 
networks, our algorithm gives y^-scaling. We consider the 
second setting in Section IIII-CI where nodes have access to 
geographic position information. We show that again for many 
social networks, the dissemination time can be further reduced 
to scale more slowly than ^/n. 

A. Algorithm 

Our algorithm takes the input as the diameter of the social 
network, D, as well as two parameters which we specify: e, 
and C, whose roles are as follows. Nodes are allowed to search 
for another node in the social network from which to download 
the file, at a distance of at most 2eD + 1 hops away. Thus 
if e = 0, they cannot look beyond a single hop away, and if 
e — 0.5, they have access to the entire social network. Thus the 
parameter e controls the search depth. The parameter C is used 
to exploit geographic proximity: most nodes will download the 
file from nodes that are at a geographic distance of at most C. 
If nodes have no notion of geography, we set £ = oo, hence 
all nodes are within C. Otherwise, we set £ to a smaller value. 

Given parameters {e,C,D) as described above, the algo- 
rithm finds active nodes from which eager nodes can download 
the file. This is accomplished through coordination through the 
social network. 

The main idea is the following: eager nodes send requests 
to one of their social-network neighbors with the file. Since a 
single node may get many such requests, it does not serve all of 
them, but rather finds other active nodes nearby in the social 
network to serve them, and also enlists the receiving nodes 
themselves to forward along the file. The theorems given in 
Sections IIII-BI and IIII-CI show that for the specific choices of 
parameters e and £ given, the algorithm succeeds in delivering 
the file to all nodes, and moreover does so in the advertised 
time scaling. 

When C is set to a non-infinite value, it may not always 
be possible for nodes to obtain the file from geographically 
proximate neighbors - for instance, suppose the generator has 
no neighbors in her geographic proximity. In such cases, we 
allow file transfers that exceed geographic distance C, and 
these happen from two or one-hop neighbors on the social 
network. We call transfers within geographic distance C, C- 
transfers, and all other transfers S'-transfers, since they are 
near in the social-network distance. Similarly we refer to £- 
requests and 5-requests. 

ALGORITHM 1: 

Input: parameter e, distance threshold £, and the diameter 
of the social network D. 

Requesting Phase: Consider an eager node, x, at time t. 
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Step 1: Let Nx{t) denote node x's 2eZ) + 1 -neighborhood 
in the social-graph at time t. Let (t) C Mx (t) be the set 
of nodes in Afx{t) that have the file and whose Euclidean 
(geographic) distance to x does not exceed C. 

Step 2: If Afx{t) is not empty, x sends an £-request to a 
randomly picked node in A/^(i). 

Step 3: If M^{t) is empty and the distance from x to the 
source on the social-graph is smaller than eD + 1, then x sends 
an ^-request to a one-hop neighbor in the social-graph which 
has the file. 

Step 4: Otherwise, x waits and goes back to step 1 at time 
t + \. 

Scheduling Phase: Consider an active node y. It maintains 
two balanced binary trees, an £-tree and an 5-tree, constructed 
from its ^-requests and 5-requests, respectively. It builds these 
trees by adding requesting nodes sequentially, as the requests 
arrive. This sequential building of the binary trees is depicted 
in Figure [T] 

When node y receives an £-request, node y adds the eager 
node to the £-tree and asks its parent on the tree to deliver 
the file, and similarly for 5-requests. 

Transmission Phase: An eager node waits until the node 
designated as its transmitting node in the Scheduling Phase 
has the file. It then sets up a wireless transmission, and routes 
data through a highway system described in |j2l. Note that the 
transmitter will have to serve at most 6 nodes: 2 from its own 
£-tree, 2 from its own 5-tree, and 2 from the tree it joins 
when it is an eager node (which could be either an £-tree 
or an 5-tree). Thus, we divide a time slot into six and each 
transmitter serves all nodes in a round robin fashion. 




Fig. 1 . Each active node maintains balanced binary trees and adds requesting 
nodes to trees sequentially. Suppose the active node depicted at the root gets 
four requests at time t, and two more at time t + 1. The resulting tree might 
look as depicted. The original active node would then serve nodes 1 and 2, 
subsequently node 1 would serve nodes 3 and 4, and node 2 would serve 
nodes 5 and 6. 



B. Main results: load balancing 

In this section we show that the load-balancing accom- 
plished by the £-binary trees is enough to give ^/n-scaling, 
without any geographic information. We show that our result 
holds, as long as the social network has the properties of a 
random power law graph with f3 > 2, minimum expected 
degree m > 3 and maximum expected degree M satisfying 
log(n) ^ M ^ y/n (many social networks have values of (3 
large than this - see, for example, collaboration graphs in ||9l). 
In this case, the diameter of the social-graph is O{log{n)) and 
the size of the largest component is of 8(n) ||4j[|5]. 



As discussed, we set C = oo, and e = 0, thus nodes are only 
allowed to request the file from nodes at most one hop away 
on the social network, and they entirely ignore geography. 

In this case, for any eager node x, we have A/'^(t) 7^ at 
the time t node x becomes eager, and hence the Requesting 
Phase of the algorithm uses only Step 1 and Step 2. There are 
only ^-requests, and thus the algorithm requires each node 
to transmit to at most 4 other nodes. Indeed, the point of 
this algorithm is to distribute the load evenly on the wireless 
network. 

In Theorem |4] we show that the file dissemination time 
scales like y/n (sublinearly). In addition, we show that the 
performance only differs from algorithm independent lower 
bounds with a factor for any ^ > 0. Since the proofs of 
the following two theorems are similar to those for Theorem 
|2]and Theorem [S] we defer the full details to the Appendix B. 

Theorem 4: Consider the file dissemination problem with 
wireless network and social network as defined above. Sup- 
pose the file length is F. Then the file dissemination time for 
Algorithm 1 is 

0{V^.\og'{n)F) (4) 

with high probability. 

Theorem 5: Consider the file dissemination problem with 
wireless network and social network as defined above. Sup- 
pose the file length is F. Then, for any algorithm that allows 
nodes to download the file from their 1- and 2-hop neighbors 
on the social network, the file dissemination time is lower 
bounded by 

n{n^/^-^F), (5) 

for any ^ > with high probability. 

Remark 6: Significantly, the only properties of power law 
graphs we use are the size of the diameter and the maximum 
degree. Specifically, given a graph G with diameter £max 
and maximum degree d„iax, the file dissemination time is 
0{yi^\og{draax)imaxF) if nodcs are only allowed to down- 
load the file from nodes at most 2 hops away. The proof of 
this follows immediately from the proof of the theorem. 

C. Main results: exploiting geography 

Intuitively, increasing the number of geographically proxi- 
mal downloads should decrease transmission time. We show 
that this can be accomplished, at the cost of deeper searching 
of the social network, as long as the social network has the 
properties of a random power law graph with /3 > 3 (again, 
many graphs have this property, see, e.g., the collaboration 
graphs in ifToll ). We assume that the minimum expected 
degree is m = i^log(n) where K is a constant greater 
than 10, and the maximum expected degree is M, satisfying 
log^ (n) ^ M <C \/ri. Thus, almost all nodes are in the largest 
component and the diameter of the graph is 13 w logj(n) 
1|4J|,5| (recall the definition of d from Section II). 

Setting e to a positive value translates to allowing nodes to 
search for an active node in their 2elogj(n) + l-neighborhood, 
and because of our load-balancing architecture, ultimately 
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download the file from nodes in their 4e \ogj{n) + 2 neigh- 
borhood. With more active nodes available, eager nodes can 
more easily find geographically proximal active nodes. We 
set C = Sy^T^'^Txjg^rij/fTTr for any e' < e. The value of e 
is chosen to be small, e < 1/10, allowing nodes to search 
a neighborhood that is large, but nevertheless a vanishing 
fraction of the size of the entire network. 

As load-balancing alone was able to achieve file dissemi- 
nation time scaling of ^/n, we show now that by additionally 
exploiting geography, the file dissemination time can be further 
reduced by a factor n^/^ compared to the result in Theorem 
|4] Proofs of the two theorems can be found in Section |V] 

Theorem 7: Suppose the source is chosen uniformly at 
random from the nodes in the largest component and the 
file length is F . Consider the setting described above. Then 
the file dissemination time under Algorithm 1 with parameter 
< e < 0.1 is 

0{^fn^\oi-^(n)F), (6) 

for any e' < e with high probability. 

Theorem 8: Consider the file dissemination problem under 
the setting described above. Let F be the file length. Then, 
for any algorithm that allows nodes to download the file 
from their 4elogj(ri) + 2-neighborhood with e < 0.1, the file 
dissemination time is lower bounded by 

n{n^/^-^'-iF), (7) 

with high probability for any ^ > 0. 

IV. Random Placement and Random Graphs 

In preparation for the proof in the next section, we give 
some lemmas that characterize the behavior of randomly 
placed nodes in a square, and also give properties of random 
graphs. 

A. Results about random placement of nodes in a square 

Two properties in particular, are important. For our scheme 
to work, we need to show that with overwhelming probability, 
we will not have a very high clustering of nodes (some 
clustering will occur). We also need to show that when nodes 
look in their social network for geographically proximate 
active nodes, they will be able to find at least one, with 
high probability. The next two lemmas show precisely these 
properties. 

In the first lemma, we show we control the minimum 
distance between a node and k other nodes. We use this 
lemma to ensure that each node can find a node close to 
it on the wireless-square. In the second lemma, we show a 
concentration result about the number of nodes falling into a 
small rectangle, thus showing it is not too big. The proofs of 
these lemmas are also available in Appendix B. 

Lemma 9: Place fc + 1 nodes on a square of width ^/n 
independently and uniformly. Let r be the minimum distance 
from the first node to the others. Then, we have 

P(r > A/64nlog(n)/7rfc) < n"^. (8) 



Lemma 10: Place n nodes on a square of width in- 
dependent and uniformly. Given a rectangle of area A where 
A — Ld{log{n)), let X be the number of nodes in the rectangle. 
Then, 

P{X > 2A) < n-^. (9) 

B. Results about the neighborhood behavior of random graphs 

In the following lemma, we address the relation between 
weights and the number of neighbors. Specifically, we show 
that if a node has weight Wi greater than lOlog(n), then the 
number of one-hop neighbors the node can reach in the social- 
graph is between Wi/2 and 2wi. We use this lemma as it 
provides a relationship between weights and the number of 
nodes. 

Lemma 11: Suppose Wi > 101og(r7,). Let X be the number 
of one-hop neighbors in the social-graph of node i. Then, 
Wi/2 < X < 2wi with probability 1 — o{n~^). 

The next two lemmas characterize the local behavior of 
random power law graphs. Specifically, we are interested 
in how the size of neighborhoods of nodes in the largest 
component grows. We show that for any node in the largest 
component, the number of nodes in a small neighborhood 
grows like a factor d if we explore one more step. We prove 
this by providing upper and lower bounds that only differ by a 
factor of for any f > 0. The proofs are shown in Appendix 
A. 

Lemma 12: Consider a random power law graph with pa- 
rameter /3 > 3. Suppose the minimum expected degree is 
m = K log(n) for some K >1Q and the maximum expected 
degree is M ^ log^(n). Then, there are at least avf- nodes in 
a node's e logj(n)-neighborhood with probability 1 — o(n^^), 
for any e' < e < 0.1. Here, cr is a constant depending on /3 
and K. 

Lemma 13: Consider a random power law graph with pa- 
rameter (3 > 3. Suppose the minimum expected degree is 
m = K\og{n) for some K > 10, the maximum expected 
degree is M \og^{n), and e < 0.4. Consider a node 
either picked randomly or with weight smaller than W. Then, 
there are at most 2Wd^-nf / log(ri) nodes in this node's 
e logj(n) + A-neighborhood, with probability 1 — C'(log^^(n)), 
for any e' > e and any fixed constant A where 

f \og^'f^-^{n) if 3 < ^ < 4 

~ 1 max{log^/'^-'^(n),log^(n)} if 4<^ 

(10) 

V. Performance Analysis 

A. Proof of Theorem [7| 

In this section, we first prove Theorem [T] which states the 
performance of our algorithm when geographic information 
is available, and when nodes can download the file from a 
neighborhood of radius 4elogj(n) + 2. The proof of the more 
simple load-balancing case (where we set £ = oo and e — 
0) is essentially a consequence of this proof - for the full 
details we refer to Appendix B. Specifically, we show the file 
dissemination time is roughly Vn^F. 
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The proof of the theorem consists of two main parts: 
showing the existence of a geographically nearby neighbor 
in the wireless-graph and the analysis of transmission rates. 
In addition, the transmission phase of our algorithm relies on 
some routing results from fT] and (TT\ which we summarize 
here. For the full details, we refer readers to those individual 
papers. In the routing scheme, packets are routed through a 
highway system consisting of horizontal highways and vertical 
highways. Each highway serves nodes in a stripe on the 
wireless-square. An illustration is shown in Fig.|2] The results 
in ||2| and fTZ] guarantee the following properties of this 
highway system. 

1) Nodes can reach their highways in a hop of length 
0(log(n)). 

2) The highways are almost straight. For example, if a flow 
on a horizontal highway starts from x-coordinate ai with 
destination at ^-coordination a2 > ai, it will not reach 
any node with x-coordinate smaller than ai — H where 
H = 0(log(n)). 

3) Highway nodes can communicate with neighboring 
highway nodes with a constant rate. A highway node 
serves flows through it with equal rate. 




Fig. 2. An illustration of the highway .system and routing. Packets are first 
routed through horizontal highways to vertical highways corresponding to 
destinations. 

We now move to the proof of the theorem. We first state the 
existence of an "intermediate node" in the following lemma. 

Lemma 14: Consider a random power law graph with (3 > 
3 and m — K\og{n) where if > 10 is a constant. For some 
e < 0.1, consider a node x in the largest component, such 
that the distance from x to the source on the social network 
is greater than elogj(n). Then, there exists a node y which 
satisfies the follows with probability at least 1 — o(n^^): 

1) J/ is in the 2e logj(?T.) + l-neighborhood of x in the social- 
graph. 

2) The distance from y to the source on the social network 
is smaller than that from x to the source. 

3) The Euclidean distance from a; to ?/ on the wireless- 
square is smaller than C 

Proof. We first show that there exist an'^ nodes satisfying 1) 
and 2) with probability l — o{n~^). Let dx be the distance from 
X to the source on the social network. Since dx > elogj(n), 
there exists a node z such that the distance from z to a; on the 
graph is elogj(n) + 1 and the distance from z to the source 



on the graph is dx — elogj(n) — 1. Therefore, nodes in the 
elogj(n)-neighborhood of z in the social-graph satisfy 1) and 
2). In addition, by Lemma [12] the size of such a neighborhood 
is greater than crn'^ with probability 1 — o{n^^). 

Thus, by Lemma |9l there exists a node y among the an'^ 
nodes whose Euclidean distance to x on the wireless-square 
is smaller than £ with probability 1 — o{n^^). ■ 

Proof. (Theorem [T]) Recall that our algorithm classifies 
transmissions as those chosen because they are geographically 
within distance £, called /^-transmissions, and those chosen 
because they are within two hops on the social network, called 
5-transmissions. 5-transmissions are those whose Euclidean 
distances between transmitters and receivers on the wireless- 
square are not guaranteed to be less than 2C, as are C- 
transmissions. Note that the number of 5-transmissions is 
smaller than the number of nodes in e logj(n)-neighborhood 
of the source in the social-graph which is smaller than 
2Wrf I log(n) for any e" > e with high probability by 
Lemma [13] 

Now, we bound the number of flows through a highway 
node at any time. Consider a transmission between two 
nodes with Euclidean distance less than 2£. By the fact 
that highways are almost straight and the first and last hops 
are of length C'(log(n)), the transmission passes through 
a horizontal (vertical) highway node only if the horizontal 
(vertical) distance between the transmitter (receiver) and the 
node is smaller than 3£ on the wireless-square. In other words, 
^-transmissions through a horizontal (vertical) highway node 
must fall in a rectangle of side 6£ x /i in the corresponding 
horizontal (vertical) strip where /i is a constant provided in Q • 
Since, by Lemma [TO] the total number of nodes falling into 
this region is Oi^L) with probability 1 — o(n^^) and each node 
generates at most a constant number of flows, using the union 
bound we can conclude that all highway nodes have at most 
0{C) £-flows with probability 1 — o(l). In addition, since 
there are at most IW'rf j log(n) S'-transmissions, the total 
number of flows through each highway node is with 
probability 1 — o(l). Therefore, each flow has a rate r2(l/£) 
with high probability and each node can receive the file in 
cqCF time slots for some constant co > from the time 
when the transmission begins. 

We prove the theorem by induction on fc: the distance 
from a node to the source on the social-graph. Let Mk 
denote nodes whose distance to the source is fc on the 
social-graph. The claim of the induction is that a node in Mk 
can receive the file in at most fccolog2(n)£F time slots. By 
our notation. Wo is the source node. First note, that the base 
case fc = 1 of the induction clearly holds. Now, we suppose 
it is true for fc — 1 and consider nodes in A/fe. Note that 
no nodes in Mk are inactive at time (fc — l)co log2(n)£F. 
Further, by Algorithm 1 and Lemma [14] all nodes in Mk 
can request the file, according to the algorithm, from an 
active node in ufZQ^A/i. Thus, these nodes have to wait at 
most log2(n) — 1 successful transmissions before starting 
to receive the file, since the depth of any binary tree is at 
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most log2(n). Therefore, they can receive the file before time 
kco\og2{n)CF. Hence, by induction, the file dissemination 
time is 0{\/n^^^' \og^'^ {n)F) as the diameter of the social- 
graph is C'(log(n)). ■ 



B. Proof of Theorem |S] 

We proceed by first providing some definitions and a lemma. 
Given a transmission pair with rate r over an Euclidean 
distance p on the wireless-square, define the bit-meter rate 
of the transmission pair as rp. The total bit-meter product a 
network can transmit is the supremum of the sum of bit-meter 
products of all transmission pairs. 

Lemma 15: The total bit-meter product the network can 
transmit in a time slot is 8(rt). 

Proof. From ([T]), we know the bit-meter product a transmis- 
sion pair {xi,Xj) can transmit is 



Cjlllog 1 



<Pt{x,,Xj)\\x,-x,\\lNQ. (11) 

Recall that t{xi^ Xj)\\xi — Xj\ \ is bounded by a constant either 
for 7 > or 7 = and a > 2. Since there are at most n/2 
transmission pairs, the total bit-meter product the system can 
transmit is <d{n) in a time slot. ■ 

To prove the lower bound, we place no restrictions on 
computation or communication overhead. Moreover, we make 
(overly) optimistic assumptions throughout in order to guar- 
antee a bound. For instance, we assume nodes only download 
from their nearest social-network neighbors. 
Proof. (Theorem |8]l Define the transport load as the infimum 
of the total bit-meter product required to disseminate the 
file under the problem setting. To apply Lemma [15] we just 
need to show that the transport load is ^l{n?'/'^^'^'^^^F) with 
probability 1 — o(l). 

Let M be the set of nodes in the largest component with 
expected degree in the range [K\og{n) 2K\og{n)]. Then, 

.2Xlog(n) 

\M\^^-^^ ^ = (l + o(l))(l-2i-^)n. (12) 



K log(n) ■ 



-Hz 



since almost all nodes are in the largest component. 

Fix any e' > e. Let Mi be the set of nodes that node i can 
reach in 4e log^(ri) + 2 hops in the social-graph. Let Xi be the 
indicator that the Euclidean distanc e from node i t o Mi on the 
wireless-square is smaller than \J n/2TrW(Pn'^'^' . Therefore, 
we have, for i E Ai and n large enough and some constant 

Cl, 

Pix.^i) < P({x, = i}n{|A/;| < 2wJV^7iogW}) 

+¥{\M;\ > 2Wd^n'^'' /login)) 
< 1/ log(n) + 0(1/ log(n)) < Cl/ log(n) 



since the probability that a node is close is smaller than 
l/2W(Pn'^'^ and the second term comes from the probability 
that |M| > 2W(fn^'''/log{n). Therefore, we have 



jeM 
We claim that 



< 2ci(l - 2^-'^)n/ login) 



(13) 



> 



rt/ log^^'^(7i)) = o(l). Indeed, by Chebyshev's inequality, we 
have 



< 



< 



ieM 

7l2/l0g^/^(n) 

Cin^/logjn) ^ 
nVlog'/3(n) ~ 



ieM 



o(l) 



> n/log^/^{n) 



(14) 



where the last inequality follows from E[XiXj] < E[Xi]. 

By the above claims, \Mi\ = Q{n) while the number of 
nodes with geographically close neighbors in the wireless- 
square is o(n). Hence, the transport load is VL{n'^/'^^'^'^^^F). ■ 



VI. Conclusions 

New technology (smartphones, etc.) has made content cre- 
ation easy - just a press of a button. Social networks, mean- 
while, make wide dissemination of the knowledge of that file, 
just as easy - a press of another button. Yet actual dissem- 
ination of large files to many users can seriously burden a 
wireless network. In the WAN setting, the time to disseminate 
must scale linearly in the number of users. In this paper, we 
consider simple, low-overhead file dissemination algorithm 
that exploits peer-to-peer capabilities of many smartphones 
and similar devices, and, critically, exploits the very social 
networks that spread knowledge of the file. We give a load- 
balancing algorithm that uses the social network to schedule 
transmissions so that spatial-capacity of the ad hoc network is 
exploited without creating congestion or hot spots. We show 
that dissemination time scales like ^/n — significantly slower 
than the linear time for WAN. Then, we show that if nodes 
have knowledge of geographic position, this can be exploited 
to further decrease file dissemination time. Finally, we show in 
both cases that our algorithm performs close to an algorithm- 
independent lower bound. 

VII. Appendix A 

A. Proof for Lemma [72] 

We first quote lemma 3.2 from |4|. This useful lemma 
addresses how a neighborhood of a set in the random power 
law graph grows. Specifically, if we have two sets S and T, 
what is the sum of weights of neighbors of S which are also in 
T? One important application is the setting where T ^ G, i.e., 
T is almost the entire graph. In this case, we get an increase 
factor of roughly d. 
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Lemma 16: (111) Given a random graph and two subsets S 
and T, if 

2c vol3(r) vol(5) 

(52vol2(r) - vol(G)' ^ ' 

vol(5) vol2(r) 

we have 

voi(r(5)nr)>(i-2J)^i|^voi(5) (17) 

with probabihty 1 — where T{S) is the set of one-hop 
neighbors of S. 

Using this lemma, we provide a proof to Lemma [T2l which 
we used to lower bound the size of a node's immediate 
neighborhood. 

Proof. (Lemma [TZt Consider node a;'s neighborhood. Let 
Si be the set of nodes whose distance to x is z and 5*0 = 
{x}. We will show that vol(S',+i) > (1 - 25)dYo\{Si) for 
(5 = 1/4 with probability 1 — o(ri^^). To do this we need to 
apply Lemma [16] inductively and choose c — 31og(n). We 
may assume vol(5i) < ■n?'^ in the first elog^(n) steps. Since 
vol(G') = Q.{nlM) for all T, (O holds for all /3 > 3. 
We nave only to verify ( fTST l. 

First notice that vol(5'i) = r2(log^(n)) with probability 1 — 
o{n~^). This is true since, by Lemma (TT] the node x has at 
least K\og{n)/2 neighbors with probability 1 — o(n~^) and 
each neighbor has weight at least K\og{n). 

We next verify ([T5]l for P > i. Let be the set of all 
potential nodes whose distance to a; is i + 1, i.e., Ti — G\ 
Ul^oSk. Then, voh{T,) = (1 + o(l))vol2(G) and YohiT,) = 
(l + o(l))vol3(G). Thus, = e(l). Therefore, we 

have the result by induction. 



For the case 3 < ^ < 4, let Tj;''^ be the intersection of T.^ 
and the set of nodes with weight smaller than fclog(n). Then, 
we have 



2c vols (t;^") 



and 



vol(G) = 0(log(n)) 



(18) 



V0l2(T> ') JKlog(n) "-^ 
V0l2(G) 



M 

K log(«) 



(l + o(l))(l-(-^)^-^). 



(19) 



B. Proof for Lemma \T3\ 

The proof of Lemma [13] depends on the following lemma 
from |5 1, that provides large deviation results for both an upper 
bound and a lower bound for the sum of Bernoulli random 
variables. 

Lemma 17: (fS)) Let Xi be a Bernoulli random variable 
with parameter pi. Suppose {Xi} are independent. Let X = 

X^-Li o-i^i and V ^J2i=i °HPi- Then, we have 



\X < E[X] - c) < exp(-cV2i^) 



(20) 



F{X > ]K[X] + c)< cxp{-c^/2{u + ac/3)) (21) 
where a = max{ai, 02, ... , On}. 

Proof. (Lemma [T3Tl We first state the flow of the proof. In the 
beginning, we show that we only need to consider an initial 
node X with weight W. We next define Si as the set of nodes 
at distance i from node x and show that, for any 6 > 0, 



vol(5,) < W{{l + S)dy 



(22) 



We in fact show vol(5i+i) < {l + S)dvo\{Si) with probability 
1 — C'(log~^(n)). To do so, we construct a set T* which 
contains nodes with large weight and show n Si+i — 
with overwhelming probability. On the other hand, we use 
Lemma [Tt] to bound the sum of weights in Si^i contributed 
by nodes with small weight. To do this, we have to consider 
three cases depending on /?. 

We now present the details of the proof. We first show the 
condition, vol(S'o) < W. Since Sq = {x}, we need to show 
the weight of x does not exceed if a; is picked randomly. 



M 



log^(n) 



x'-'^dx/Ci = 



1+0(1) 2/ NNl-fl 



Gi(/3-l) 
o{\og-\n)), 

r-M 



(23) 



where Gi is the normalization constant f x ^dx. Thus, a 
randomly picked node x has weight smaller than W with high 
probabihty. Since by standard coupling arguments we see that 
the growth of the neighborhood of x is dominated by a node 
with weight W, we simply take the weight of 2; to be in 
what follows. 

We turn to show vol(S'i+i) < (1 + 6)dvo\{Si) with prob- 
ability 1 — C'(log~^(n)). First, we give some definitions. Let 

m, = (vol(501og''(n))i/(^'-2) ^j^g^g jjje set 

of nodes with weight greater than m^. We first show that 
T' n = with probability 1 - 0{\og-^{n)). Indeed, 



vol(5,) / x^-^ndx/Civol{G) 



0{log-\n)), 



Therefore, by ([T3, we have (O is true by induction. On F(5j+i n 0) < vol(50vol(T')/vol(G) 
the other hand, vol2(Tj*^'^-')/vol(G) ss J as fc becomes large '"^^ 
enough, by (fT9] l. 

With the above results, we can conclude that the size of 
the neighborhood grows by roughly a factor of d/2 with 
probability 1 — o(n^^) for each step, from the second one to 
the elogj(n)*'' step. Since vol(5, iog_j(„)_i) > 2crn^(i-°(i» 
for some constant a, there are at least an'^ nodes within 
distance e logj(n) of x in the social network. ■ 



(24) 



where the first inequality follows from the union bound. 
Therefore, with high probability, Si+i n = 0. 

Define Ti to be the set of unexplored nodes with weight 
smaller than m in the i-th step, i.e., fi = V\{T*U 
Thus, Si+i is a subset of Ti. For simplicity, we consider a 
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larger set Ti which includes Ti and virtual nodes Vi where Vi 
is chosen such that the number of nodes and their weights are 
the same as those in U*^Q5j. We allow nodes in Si to connect 
to nodes in Vi and, therefore, have a looser upperbound on 
vol(5i+i). We first give some properties of Ti. Specifically, 
we claim vol(Tj) = (1 + o(l))vol(G) and vol2(T,) = (1 + 
o(l))vol2(G). Indeed, 



vol(r,) = vol(G)-vol(T*) 

r-M 

> vol(G)- / X 

Jlog/3/(/3-2)(„) 

= vo\{G) ~ O{n\og~\n)) 
= (l + o(l))vol(G), 



ndx/Ci 



(25) 



where the inequality follows from vol(S'i) will increase by 
a factor at least d/2 in each step as shown in Lemma \l2\ 
Similarly, we have vol2(r,;) = (1 + o(l))vol2(G). 

Next, we give some properties of 5*^+1. Our goal is to 
find the expected weight of Si+i, E,[Yi], and the variable i^i 
(defined below) to apply Lemma [TT] To do this, define Xj as 
the indicator function that node j is in Si+i. Thus, by union 
bound and a fact (in the proof of Lemma 3.2 in 14]), we have 



Yo\{S^)wj/vo\{G) - (vol(5',)wj/vol(G))2 
< P{Xj = 1) < vol(5',)u;j/vol(G). 



Let Yi be the volume of Si+i, i.e., Yi = J^j 
we have 



(26) 
WjXj. Thus, 



E[y,] = (1 + o(l))vol(5,)vol2(T,)/vol(G). (27) 

Similarly, define Vi = X^jgt Wj^i^j ~ !)■ We have 

iy^ = il + o(l))vol(5,)vol3(T0/vol(G). (28) 

Using the properties of Tj and Si+i, we now show the 
inductive step, namely: vol(S'i+i) < (1 + 6)dvo\{Si) with 
probability 1 — o(log^^(n)). Recalling Lemma [TTl we have 



^{Y, > E[Y,] + K,) < exp 



2{ui + rhiKi/3) 



(29) 



We need to consider three cases which are 3 < (3 < 4, /3 = 
4, and /3 > 4. In each case, we first estimate i^i and then 
compare crhi and ^/cUi. According to the above comparison, 
we specify Ki for each case and conclude our desired result. 
Let c = lOloglog(n) and consider the three cases. 
Case 1 (3 < /3 < 4): First note that 

J m 

= 0{Yo\iS,)mt-^\og^-\n)). (30) 

Therefore, crhi ^ y/cUi for n sufficiently large. Hence, 
choosing Ki = cmt, we have 

^2 



p(r, >E[y,] + K,)<exp - 



ArhiKt 



o(log-^(n)). (31) 



Note that by (|27] ). we need only to show that Ki 
(5vol(5i)vol2(T!i)/vol(G). This suffices to show 



< 



vol(5,) > 



evol(G) 
<^vol2(r,) 



P-2/P-3 



log' 



.fi/13-3 



(n). (32) 



But this is true since the initial weight is greater than 
log'^^'^^'^(n) and it increases by a factor of at least d/2 in 
each step. 

Case 2 (/3 = 4): We have i^, = 0{vo\{S^)\og{m^)\og'^{n)) 
and cjfii ^ y/cvi. Hence, with similar computation as that 
described in Case 1, we have the desired result. 
Case 3 {(3 > 4): We have i^^ = e(vol(5,) log^(n)). By 
direct computation, we have ^/cUi ^ crhi provided the 
initial weight is greater than log^/^~^(n). Hence, we choose 
Ki — yjcvi. Similarly, we just need to show that < 
(5vol(S'i)vol2(Ti)/vol(G). This is true since, by 



vol(50 > 



cvol3(T,)vol(G) 
<S2vol2(T,)2 



e(loglog(n)). 



(33) 



Note that the probability of failure in each step is 
C'(log^^(n)) and there are at most elogj(n) + A steps. Thus, 
the sum of weights of nodes in e logj(n) + A-neighborhood 
of X is at most 2M^J'^n''^+°(^". We conclude that the desired 
result holds with probability 1 — C'(log^^(n)) as each node 
has weight at least log(ri). ■ 

VIII. Appendix B 

A. Proofs of lemmas in Section |73 

In this section, we first show Lemma |9l Lemma [TOl and 
Lemma [TT] The proofs of these lemmas use techniques for 
balls and bins problems. To solve a problem like these, in 
general, we first find a proper target function and write the 
target function as a sum of indicator functions. We next use 
a large deviation result, e.g. Lemma [TT] to show the target 
function is concentrated around its mean. The proofs of the 
three lemmas do follow the above procedure and are shown 
below. 

Proof. (Lemma |9]l Let Xi be the indicator function that 
the distance between the first node and node i is smaller 
than ^64n log(n)/7rfc. Let Y ^ Y.'ltl X^, i.e., Y is the 
number of nodes with distance to the first node smaller than 
■\/64n log(n)/7rfc. To show this lemma, we first find out the 
mean of Y and show Y is around E[y] with overwhelming 
probability. Indeed, 



fc+i 



EfFl 



i=2 
k+1 



i=2 



> 161og(n) 



(34) 



where the last inequality follows if the first node is located at 
a corner of the square. 
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To apply Lemma [TtI we choose c = E[F]/2 and observe 
p = E[Y], and get 

P(y < E[r]/2) < exp(-(E[r]/2)V2E[y]) < (35) 

The above equation dSSl l implies at least 8 log(n) nodes close 
to the first node with probability 1 — n^^ and we have the 
lemma. ■ 

Similar to the above proof, we show the rest of two lemmas. 
Proof. (Lemma [Toli We may assume A > lOlog(n). Let Yi 
be the indicator function that node i falls in that rectangle. Let 
X = X]r=i ' '■^^ number of nodes falling in the 

rectangle. To show this lemma, we first find out the mean of X 
and show X is around E[X] with overwhelming probability. 
Indeed, 

n 

1=1 

= X^p(r. = i) 

1=1 

> lOlog(n). (36) 

To apply Lemma [17] we choose c — M[X] and observe 
iy^E[X], and get 

¥{X > 2E[X]) < exp{-{E[X]f/8E[X]/3) < n^^ (37) 

The above equation (|37] i implies at most 2A nodes falling 
in the rectangle with probability 1 — n^^ and we have the 
lemma. ■ 

Proof. (Lemma fTTT i Let Yj be the indicator function that 
(i, j) € E. Let X = is '^he number of one- 

hop neighbors of node i. To show this lemma, we first find out 
the mean of X and show X is around E[X] with overwhelming 
probability. Indeed, 

E[X] = ^EK-] 

= {l + o{l))w,. (38) 

To apply Lemma[T7] we choose ci = E[X] for upper bound 
and C2 = E[X]/2 for lower bound. Observe i/ ~ E[X], and 
get 

F{X > 2E[X]) < exp(-(E[X])V8E[X]/3) < o{n-^) (39) 

P(X < E[X]/2) < exp(-(E[X]/2)V2E[X]) < o{n-^) 

(40) 

One may observe o(l) term does not affect the results. 
Therefore, with above equations (l39l l and (|40| |. we have the 
lemma. ■ 



B. Proofs of Theorem |4] ant/ Theorem |5] 

In this section, we present our proofs for Theorem |4] and 
Theorem |5] the performance of our algorithm and the lower 
bound on the file dissemination time of any possible algorithm. 
We consider a random power law graph with /3 > 2 and nodes 
are only allowed to download the file from nodes at most two 
hops away. In Theorem |4l we set the input of Algorithm 1 
as e = and C — co. Thus, nodes always request to one- 
hop neighbors on the social-graph. We show that our load- 
balancing scheme, exploiting the property social networks 
have small diameters, guarantees the file dissemination time 
scales like ^/n. In the proof of Theorem |5] we adopt an 
approach similar to that in Theorem[8]in which we find a lower 
bound on the transport load. We show that the performance 
of our algorithm only differs from the best possible file 
dissemination time by a factor of for any > 0. Our 
proofs are presented in the follows. 

Proof. (Theorem lU We first claim that each transmission has 
a rate ^/n). To show this, consider a horizontal highway 
node and its corresponding stripe. Note that these n nodes are 
placed uniformly and independently on the square. By Lemma 
[TOl there are 0{^/n) nodes in this stripe with high probability. 
On the other hand, each node only generates at most 6 flows. 
Therefore, each flow through the horizontal highway node can 
have a rate of 17(1/ -^/n)- A similar argument applies to vertical 
highway nodes. As this is true for all highway nodes with 
high probability, we have the claim. In addition, there exists a 
constant ci such that each node can receive the file in ci -JnF 
time slots since the transmission starts. 

Similar to the proof in Theorem |2l we show the theorem 
by induction on k: the distance from a node to the source 
on the social-graph. Our claim is nodes at distance k to the 
source can receive the file in cik^Jn\og2{n)F time slots. 
It is clear that the base case is true for k = 1. Suppose 
this is true for fc — 1 and consider nodes at distance k to 
the source. Since each such node is not inactive at time 
ci(fc — l)-/ri\og2{n)F, the node must be in a binary true 
with an active node as the root. Therefore, this node has to 
wait at most log2(n) — 1 transmissions before getting served. 
Thus, the node can receive the file at time cik^/n\og2{n)F . 
Hence, by mathematical induction, we have the claim. Note 
that the diameter of the social graph is 0{\og{n)). All nodes 
can get the file in 0{^/n\og^{'n)F) time slots. ■ 

Proof. (Theorem ID To apply Lemma [15] we just need to 
show that the transport load is VL{'rv'/'^^^F) for any ^ > 
with probability 1 — o(l). The idea is to show there are 8(n) 
nodes in the largest component which only have small-sized 2- 
neighborhoods. Thus, these nodes must download the file from 
nodes which are geographically far away from them on the 
wireless-square. We first state the flow of the proof. In the first 
step, we claim we only need to consider a random power law 
graph with minimum expected degree m = K log(rt) for some 
K > 10. More precisely, only consider nodes with weight in 
the region [K log(ri) 2K log(ri)] in such graphs. We next show 
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that only a vanishing fraction of the number of them can find 
geographic proximate one-hop or two-hop neighbors. In the 
end, we show that 8(n) of them are indeed in the largest 
component and thus, have the theorem. 

We first show that we may assume that the minimum 
expected degree m = K\og{n) for some constant K > 10. 
To do this, consider the original minimum expected degree 
m < lOlog(n) and the original expected degree sequence 
w = (wi, ...,?£>„). Let w be the expected degree sequence 
for m = K\og{n) for some K > 10. Observe that (O is an 
increasing function in terms of d. We have w is smaller than 
w term by term. Thus, by coupling, the random power law 
graph generated by w contains the original random power law 
graph stochastically. 

Next, we show that nodes with weight in that region 
have a small-sized 2-neighborhood with high probability. This 
property is important as small-sized neighborhood implies 
it is hard to find geographic proximate neighbors. Consider 
2^ > ?7 > 0. Let JVi be the set of nodes that node i can 
reach in 2 hops in the social-graph. We claim P(|A/i| < 
10/fn''log(n)) = 1 - o(l/log(7i)). Indeed, by Lemma [lU 
node i has at most 4/f log(n) neighbors on the social-graph 
with probability 1 — o{n^^). Further, the probability that one 
of its neighbors is of weight greater than is smaller than 



Next, we show a concentration result. We claim that 



2K\og{n) J^!, x^^^ndx 



AG) Ik 



M 



log(n) 



o(l/log(n)). 



(41) 



Thus, the sum of weights of its neighbors is smaller than 
4isrn'' log(n) with probability 1 — o(l/ log(ri)). Hence, by 
Lemma [TT] again, we have the claim. 

Let Ai be the set of nodes with expected degree in the range 
[K\og{n) 2K login)]. Then, 



2K log(n) 
Klog(n) 



X ^ndx 



M 

K login) 



x~Pdx 



(l + o(l))(l -2i-'5)n (42) 



We next claim only o{n) nodes in M have geographic 
proximate neighbors (the distance between the neighbors and 

the node is smaller than \J n/lOKim^ log^(n)). This property 
along with implies almost all nodes in A4 do not have 
geographic proximate neighbors. We show this property in the 
follows. We first find out the expected number of nodes in A4 
which have geographic proximate neighbors. Let Xi be the in- 
dicator function that the Euclidean distan ce from node i to Afj 
on the wireless-square is smaller than \J n/lOK-Kri^ log^(n). 
Therefore, we have, for i e and n large enough 

P(X, = 1) < 1/ log(n) + o(l/ log(n)) < 2/ log(n) (43) 

since the first term is the probability that \Mi \ < lOKn^' log(?T.) 
and Xi = 1, and the second term is the probability that \J\fi\ > 
lOKn^ \og{n). Therefore, we have 



E 



.ieM 



< 2(l-2i-'')n/log(n) 



(44) 



IE 



X, 



[E^eMX^]\ > n/log'/'{n)) = o(l). 



Indeed, by Chebyshev's inequality, we have 



ieM 



X,-E 



.i£M 



> n/log^/^(n) 



< 



< 



n^/\og^/^{n) 
2n'^/\og{n) 
7i2/log2/3(n) 



0(1) 



(45) 



where the last inequality follows from E[XjXj] < E[Xi]. 

At the end, let S be the set of nodes in the largest 
component. Since almost all nodes are in the largest 
component, we have |5 n = 8(n). Hence, the result 
follows by the above fact only o{n) nodes in A4 have 
geographic proximate one-hop or two-hop neighbors. ■ 
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